Skip to content

14 html escaping and unescaping is slow and incomplete#17

Merged
lupodevelop merged 11 commits intomainfrom
14-html-escaping-and-unescaping-is-slow-and-incomplete
Jan 9, 2026
Merged

14 html escaping and unescaping is slow and incomplete#17
lupodevelop merged 11 commits intomainfrom
14-html-escaping-and-unescaping-is-slow-and-incomplete

Conversation

@lupodevelop
Copy link
Owner

This pull request introduces significant improvements to HTML escaping and unescaping in the str library, replacing custom implementations with the houdini and odysseus libraries for better performance and comprehensive entity support. It also adds new dependencies, updates documentation, and includes extensive new tests for HTML entity handling.

HTML escaping/unescaping improvements:

  • Replaced the custom escape_html implementation in src/str/core.gleam with houdini.escape for faster, allocation-friendly HTML escaping.
  • Replaced the custom unescape_html implementation in src/str/core.gleam with odysseus.unescape, supporting named entities as well as numeric decimal and hex entities.
  • Added houdini and odysseus as dependencies in gleam.toml.
  • Updated the changelog and README to document these changes and the expanded entity support. [1] [2]

Testing enhancements:

  • Added comprehensive unit and fuzz tests for HTML escape/unescape, including roundtrip, numeric entities, malformed input, and Unicode/emoji cases in test/str_html_escape_test.gleam, test/str_html_escape_extended_test.gleam, and test/str_html_escape_fuzz_test.gleam. [1] [2] [3]

Documentation and workflow updates:

  • Added a new CONTRIBUTING.md with guidelines for contributing, testing, and PRs.
  • Updated the logo pointer in README.md to use a raw GitHub URL for Hexdocs compatibility.
  • Added a GitHub Actions CI workflow for running tests and formatting checks. (.github/workflows/ci.yml)

Updated gleam.toml and manifest.toml to include houdini and odysseus as new dependencies. Bumped package version to 1.2.3 to reflect these changes.
Replaces manual HTML escaping and unescaping logic with calls to the houdini and odysseus libraries, simplifying the code and improving maintainability.
Introduces unit tests for str.escape_html and str.unescape_html, covering basic escaping, unescaping, roundtrip conversions, and numeric entity handling.
Documented changes for version 1.2.3, including replacement of HTML escape/unescape implementations, new dependencies, and added tests. Contributions and suggestions are also acknowledged.
Changed the README logo image source to use a raw.githubusercontent URL for better Hexdocs compatibility. Added a note about escape_html and unescape_html library changes in version 1.2.3. Updated CHANGELOG to document these changes and credit the suggestion.
Introduces a GitHub Actions workflow to run tests and formatting checks on push and pull request events for master and main branches. The workflow sets up the BEAM environment, installs dependencies, runs tests, and checks code formatting.
Introduces a CONTRIBUTING.md file outlining steps for contributing, commit conventions, PR checklist, setup instructions, and testing recommendations for the project.
Introduces comprehensive tests for str.escape_html and str.unescape_html, covering basic entities, numeric and named entities, malformed and unknown entities, combined entities, Unicode/emoji roundtrips, and idempotence/double escaping.
Moved the gleam/list import to the top and reformatted a long assertion in malformed_and_unknown_entity_test for improved readability. (gleam format)
Deleted the 'links' field from gleam.toml as it is no longer needed. This cleans up the configuration file.
Introduces a deterministic fuzz test to verify that str.escape_html and str.unescape_html roundtrip correctly and that the escaped output does not contain raw angle brackets or quotes.
Copilot AI review requested due to automatic review settings January 9, 2026 09:48
@lupodevelop lupodevelop linked an issue Jan 9, 2026 that may be closed by this pull request
@lupodevelop lupodevelop merged commit 994f955 into main Jan 9, 2026
2 of 3 checks passed
@lupodevelop lupodevelop deleted the 14-html-escaping-and-unescaping-is-slow-and-incomplete branch January 9, 2026 09:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request replaces custom HTML escaping/unescaping implementations with specialized libraries (houdini and odysseus) to improve performance and provide comprehensive HTML entity support, while adding extensive tests and CI infrastructure.

  • Replaces custom escape_html and unescape_html implementations with library calls for better performance and comprehensive entity support
  • Adds comprehensive test coverage including basic, extended, and fuzz tests for HTML entity handling
  • Adds CI workflow, contributing guidelines, and updates documentation

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/str/core.gleam Replaces custom HTML escape/unescape implementations with houdini.escape and odysseus.unescape library calls
gleam.toml Adds houdini and odysseus dependencies; bumps version to 1.2.3
manifest.toml Adds package manifests for new dependencies
test/str_html_escape_test.gleam Adds basic unit tests for HTML escape/unescape with roundtrip and numeric entity tests
test/str_html_escape_extended_test.gleam Adds extended tests for edge cases including malformed entities, Unicode, and idempotence
test/str_html_escape_fuzz_test.gleam Adds deterministic fuzz testing for HTML escape/unescape roundtrips
README.md Updates logo URL to raw GitHub URL and documents the library changes
CHANGELOG.md Documents the 1.2.3 release changes
CONTRIBUTING.md Adds contributing guidelines with setup, commit conventions, and PR checklist
.github/workflows/ci.yml Adds GitHub Actions workflow for automated testing and formatting checks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert string.contains(escaped, "\"") == False
assert string.contains(escaped, "'") == False

True
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns True but test functions in gleeunit should return Nil. Additionally, the run_cfg calls on lines 65-67 have return values that are ignored. Consider removing the return value from both run_cfg and this test function.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +26
list.fold(cases, True, fn(_, s) {
let escaped = str.escape_html(s)
let unescaped = str.unescape_html(escaped)
assert unescaped == s
True
})
}

Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first parameter in the fold function is unused (indicated by the underscore). The function returns True but this return value is never used. The entire fold seems unnecessary here - consider using list.each instead, or restructure to avoid the unused parameter and return value.

Suggested change
list.fold(cases, True, fn(_, s) {
let escaped = str.escape_html(s)
let unescaped = str.unescape_html(escaped)
assert unescaped == s
True
})
}
list.each(cases, fn(s) {
let escaped = str.escape_html(s)
let unescaped = str.unescape_html(escaped)
assert unescaped == s
})
}

Copilot uses AI. Check for mistakes.
- [ ] Tests added/updated
- [ ] `gleam format` & `gleam test` pass
- [ ] Update `CHANGELOG.md` if behaviour changes
- [ ] Document noteworthy changes in `README.md` , docs/ or examples/
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra space before the comma in "README.md , docs/". It should be "README.md, docs/" without the space before the comma.

Suggested change
- [ ] Document noteworthy changes in `README.md` , docs/ or examples/
- [ ] Document noteworthy changes in `README.md`, docs/ or examples/

Copilot uses AI. Check for mistakes.
python3 scripts/generate_character_tables.py
```

Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocation‑friendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details.
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hyphen in "allocation‑friendly" is a non-breaking hyphen (U+2011) instead of a regular ASCII hyphen. This is inconsistent with the same term in CHANGELOG.md line 8 which uses a regular hyphen. Consider using a regular hyphen for consistency.

Suggested change
Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocationfriendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details.
Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocation-friendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details.

Copilot uses AI. Check for mistakes.
run_cfg(42, 50, tokens)
run_cfg(123, 200, tokens)

True
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test function returns True but this value is never used. In Gleam tests using gleeunit, test functions should return Nil. The function should not return True at the end.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML escaping and unescaping is slow and incomplete

2 participants